A Hierarchical Technique for Constructing Efficient Declustering Schemes for Range Queries
نویسندگان
چکیده
Multi-disk systems, coupled with declustering schemes, have been widely used in various applications to improve I/O performance by enabling parallel disk accesses. A declustering scheme determines how data blocks should be placed among multiple disks to maximize the parallelism. We focus on the problem of declustering grid-structured multidimensional data with the objective of reducing the response time for range queries. Because of the combinatorial nature of the problem, it is not computationally feasible to perform an exhaustive search for the best scheme for large values of M (the number of disks). In this paper, we present an efficient technique for building good-performance declustering schemes for large values of M , based on known good declustering schemes for small values of M . We analyze the performance of the declustering schemes generated by this hierarchical technique, giving tight bounds on their query response times. For example we show, in two dimensions, that using optimal declustering schemes for M1 and M2 disks we can construct a scheme for M1 × M2 disks whose response time, expressed in terms of the maximum number of data blocks to be retrieved from any of the disks, is at most five more than the optimal response time. Our technique generalizes to any value of M in two dimensions and selected values of M in higher dimensions. We also present simulation results to show the effectiveness of these schemes in practice.
منابع مشابه
Threshold-based declustering
Declustering techniques reduce query response time through parallel I/O by distributing data among multiple devices. Except for a few cases it is not possible to find declustering schemes that are optimal for all spatial range queries. As a result of this, most of the research on declustering has focused on finding schemes with low worst case additive error. However, additive error based scheme...
متن کاملEfficient retrieval of multidimensional datasets through parallel I/O
Many scientific and engineering applications process large multidimensional datasets. An important access pattern for these applications is the retrieval of data corresponding to ranges of values in multiple dimensions. Performance is limited by disks largely due to high disk latencies. Tiling and distributing the data across multiple disks is an effective technique for improving performance th...
متن کاملcient Disk Allocation for Fast Similarity Searching
As databases increasingly integrate non-textual information it is becoming necessary to support eecient similarity searching in addition to range searching. Recently, declustering techniques have been proposed for improving the performance of similarity searches through parallel I/O. In this paper, we propose a new scheme which provides good declus-tering for similarity searching. In particular...
متن کاملConcentric Hyperspaces and Disk Allocation for Fast Parallel Range Searching
Data partitioning and declustering have been extensively used in the past to parallelize I/O for range queries. Numerous declustering and disk allocation techniques have been proposed in the literature. However, most of these techniques were primarily designed for two-dimensional data and for balanced partitioning of the data space. As databases increasingly integrate multimedia information in ...
متن کاملSelective Replicated Declustering for Arbitrary Queries
Data declustering is used to minimize query response times in data intensive applications. In this technique, query retrieval process is parallelized by distributing the data among several disks and it is useful in applications such as geographic information systems that access huge amounts of data. Declustering with replication is an extension of declustering with possible data replicas in the...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Comput. J.
دوره 46 شماره
صفحات -
تاریخ انتشار 2003